The LIMSI 1997 Hub-4E Transcription System

نویسندگان

  • Jean-Luc Gauvain
  • Lori Lamel
  • Gilles Adda
چکیده

In this paper we report on the LIMSI system used in the Nov’97 Hub-4E benchmark test on transcription of American English broadcast news shows. There are two main differences from the LIMSI system developed for the Nov’96 evaluation. The first concerns the preprocessing stages for partitioning the data, and the second concerns a reduction in the number of acoustic model sets used to deal with the various acoustic signal characteristics. The LIMSI system for the November 1997 Hub-4E evaluation is a continuous mixture density, tied-state cross-word contextdependent HMM system. The acoustic models were trained on the 1995 and 1996 official Hub-4E training data containing about 80 hours of transcribed speech material. The 65K word trigram language models are trained on 155 million words of newspaper texts and 132 million words of broadcast news transcriptions. The test data is segmented and labeled using Gaussian mixture models, and non-speech segments are rejected. The speech segments are classified as telephone or wide-band, and according to gender. Decoding is carried out in three passes, with a final pass incorporating cluster-based test-set MLLR adaptation. The overall word transcription error of the Nov’97 unpartitioned evaluation test data

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The LIMSI 1998 Hub-4E Transcription System

In this paper we report on our Nov98 Hub-4E system, which is an extension of our Nov97 system[4]. The LIMSI system for the November 1998 Hub-4E evaluation is a continuous mixture density, tied-state cross-word context-dependent HMM system. The acoustic models were trained on the 1995, 1996 and 1997 official Hub-4E training data containing about 150 hours of transcribed speech material. 65K word...

متن کامل

The LIMSI SDR System for TREC-8

In this paper we report on our TREC-8 SDR system, which combines an adapted version of the LIMSI 1998 Hub-4E transcription system for speech recognition with an IR system based on the Okapi term weighting function. Experimental results are given in terms of word error rate and average precision for both the SDR’98 and SDR’99 data sets. In addition to the Okapi approach, we also investiged a Mar...

متن کامل

The LIMSI 1999 Hub-4E Transcription System

In this paper we report on the LIMSI 1999 Hub-4E system for broadcast news transcription. The main difference from our previous broadcast news transcription system is that a new decoder was implemented to meet the 10xRT requirement. This single pass 4-gram dynamic network decoder is based on a time-synchronous Viterbi search with dynamic expansion of LM-state conditioned lexical trees, and with...

متن کامل

The LIMSI SDR System for TREC-9

In this paper we describe the LIMSI Spoken Document Retrieval system used in the TREC-9 evaluation. This system combines an adapted version of the LIMSI 1999 Hub-4E transcription system for speech recognition with text-based IR methods. Compared with the LIMSI TREC-8 system, this year’s system is able to index the audio data without knowledge of the story boundaries using a double windowing app...

متن کامل

Transcription and indexation of broadcast data

In this paper we report on recent research on transcribing and indexing broadcast news data for information retrieval purposes. The system described here combines an adapted version of the LIMSI 1998 Hub-4E transcription system for speech recognition with textbased IR methods. Experimental results are reported in terms of recognition word error rate and mean average precision for both the TREC ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998